-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Issue dedup relevance #26
Conversation
QA Using Euclidean Distance Using 90% similarity for match and 75% for warning works best right now. |
Why did you use Euclidean Distance? I don't have a lot of deep context on the problem you are solving, but the last time I did similar work (years ago) I used dice's coefficient which worked quite well. Can you implement and compare performance? Just read wikipedia and asked ChatGPT. Both suggest Dice is more appropriate for the task. |
src/handlers/issue-deduplication.ts
Outdated
matchRepoOrgToSimilarIssueRepoOrg(payload.repository.owner.login, issue.node.repository.owner.login, payload.repository.name, issue.node.repository.name) | ||
) | ||
.map((issue) => { | ||
const modifiedUrl = issue.node.url.replace("github.com", "www.github.com"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const modifiedUrl = issue.node.url.replace("github.com", "www.github.com"); | |
const modifiedUrl = issue.node.url.replace("https://github.com", "https://www.github.com"); |
Seems more precise.
It represents the straight-line distance between any two points (vectors) on the plane. Similar vectors tend to cluster closely together, while unrelated vectors are positioned further apart.
I believe that to compute the Dice coefficient, we need to generate bit vectors and perform operations on them. However, I tried using a similar metric, |
4o:
|
I experimented with Manhattan distance (also known as Taxi Cab Distance or L1 Distance) and found that it produced results similar to L2 Distance (Euclidean Distance). One advantage of L2 Distance is that, when normalized, the distance value can be constrained within a unit circle around the vector point, meaning it often falls between 0 and 1. Both Euclidean and Manhattan distances are specific cases of Minkowski Distance, where L1 corresponds to Manhattan and L2 corresponds to Euclidean |
Updated the new UI for similar issues message. It would not create a new comment; instead, it would edit the issue specification itself to include details about similar issues. QA: Issue |
What does that mean? I can edit anybody's specification with collaborator permissions. We just need to ensure the app has sufficient permissions I'm sure. |
Sorry, I meant to say that it wouldn’t create a comment; instead, it would edit directly within the issue specification. |
I made changes here but why is there no build CI? |
Added build test Ref: ubiquity-os-marketplace/text-vector-embeddings#26 (comment)
Resolves #25